Duration-embedded bi-HMM for expressive voice conversion

نویسندگان

  • Chi-Chun Hsia
  • Chung-Hsien Wu
  • Te-Hsien Liu
چکیده

This paper presents a duration-embedded Bi-HMM framework for expressive voice conversion. First, Ward’s minimum variance clustering method is used to cluster all the conversion units (sub-syllables) in order to reduce the number of conversion models as well as the size of the required training database. The duration-embedded Bi-HMM trained with the EM algorithm is built for each sub-syllable class to convert the neutral speech into emotional speech considering the duration information. Finally, the prosodic cues are included in the modification of the spectrum-converted speech. The STRAIGHT algorithm is adopted for high-quality speech analysis and synthesis. Target emotions including happiness, sadness and anger are used. Objective and perceptual evaluations were conducted to compare the performance of the proposed approach with previous methods. The results show that the proposed method exhibits encouraging potential in expressive voice conversion.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Synthesizing Attitudes in German

Our study investigates the potential of modeling the vocal expression of human attitudes based on a limited set of prosodic and voice quality parameters and their subsequent synthetic realization. Four attitudes (uncertainty, sincerity, surprise and doubt) were taken into account. For two utterances, a set of acoustic prosodic (F0, intensity, duration) and voice quality parameters (jitter, shim...

متن کامل

Data-driven emotion conversion in spoken English

This paper describes an emotion conversion system that combines independent parameter transformation techniques to endow a neutral utterance with a desired target emotion. A set of prosody conversion methods have been developed which utilise a small amount of expressive training data ( 15 min) and which have been evaluated for three target emotions: anger, surprise and sadness. The system perfo...

متن کامل

On the use of Machine Learning in Statistical Parametric Speech Synthesis

Statistical parametric speech synthesis has recently shown its ability to produce natural sounding speech while keeping a certain flexibility for voice transformation without requiring a huge amount of data. This abstract presents how machine learning techniques such as Hidden Markov Models in generation mode or context oriented clustering with decision trees are applied in speech synthesis. Fi...

متن کامل

Interpolating Expressions in Unit Selection

In expressive speech synthesis, a key challenge is the generation of flexibly varying expressive tone while maintaining the high quality achieved with unit selection speech synthesis methods. Existing approaches have either concentrated on achieving high synthesis quality with no flexibility, or they have aimed at parametric models, requiring the use of parametric synthesis technologies such as...

متن کامل

Voice characteristics conversion for HMM-based speech synthesis system

In this paper, we describe an approach to voice characteristics conversion for an HMM-based text-to-speech synthesis system. Since this speech synthesis system uses phoneme HMMs as speech units, voice characteristics conversion is achieved by changing HMM parameters appropriately. To transform the voice characteristics of synthesized speech to the target speaker, we applied MAP/VFS algorithm to...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005